To make a long story short, this morning, Rand got in touch with Google and was advised that changing the URL so it doesn’t end in “.0” would be a wise decision. Google would prefer not to make an official or public comment, but they did give us permission to share this tidbit. Naturally, we investigated deeper, and found that it’s not just inadvisable, but literally impossible to get a URL indexed in Google’s engine if it ends with a .0 (similar to how Google won’t index file extensions ending in .exe or .tgz).
Whilst there is plenty of evidence that URLs ending in .0 often belong to spam pages (wild guess here, but let’s say there are 800,000 or so URLs on the web ending in a “.0” and maybe, oh… I don’t know, 0.5% of them are worth indexing), I’m not sure that this is a good metric by which to determine an immediate penalty. Some other decent pages that have been hit in a similar way include http://en.wikipedia.org/wiki/Windows_1.0, which enjoys a healthy number of backlinks but which won’t appear in Google. This page, URL http://en.wikipedia.org/wiki/Web_2.0, appears in Google’s index as http://en.wikipedia.org/wiki/Web_2. None of the URLs which redirect to include the slash are flagged.
Becoming more fascinated by this, we did some investigating. What we discovered was that this penalty is indeed limited to the number zero. URLs ending in .n where “n” is any other number are not removed. If Google finds a version of the page that resolves with the slash, you’ll avoid the penalty. In one instance, a page that resolved with underscores in place of the stop was indexed.
Below is an assortment of URLs which are indexed in Yahoo! (and many also in Live), but which show no PageRank and do not appear in Google’s index. Below those, I’ve listed very similar pages that are indexed, but which do not end in .0.
Out of Google’s Index (but in Yahoo!):
- en.wikipedia.org/wiki/Windows_1.0
- en.wikipedia.org/wiki/Web_2.0
- http://en.wikipedia.org/wiki/Die_Hard_4.0
- drupal.org/drupal-5.0
- keznews.com/3799_Vista_Transformation_Pack_8.0_Final_-_VTP_8.0
- en.wikipedia.org/wiki/BASIC_8.0
- drupal.org/drupal-6.0
- en.opensuse.org/OpenSUSE_11.0
- www.shopping.com/xGS-Illustrator_11.0
- www.mythtv.org/wiki/index.php/Opensuse_11.0
- www.shopping.com/xGS-Suse_9.0
- en.wikipedia.org/wiki/Mac_OS_X_10.0
- en.opensuse.org/Bugs:Most_Annoying_Bugs_
10.0
In the index:
- en.wikipedia.org/wiki/Web_2
- drupal.org/drupal-5.0-beta1
- http://keznews.com/3799_Vista_Transformation_Pack_8_0_Final_-_VTP_8_0
- drupal.org/drupal-6.0-beta1
- www.mythtv.org/wiki/index.php/Opensuse_10.3
- www.mythtv.org/wiki/index.php/Opensuse_10.2
- en.opensuse.org/Bugs:Most_Annoying_Bugs_10.3
This page has PageRank (it shows a PR 3), but didn’t show up in a Google search: http://www.fileplanet.com/62709/60000/fileinfo/WinZip-9.0-
http://www.fileplanet.com/62709/60000/fileinfo/WinZip-9.0 is not indexed and has no PageRank. Call this duplicate content if you will, but it still shows the same trend in action.
You’ll notice some interesting things, such as the fact that
en.opensuse.org/Bugs:Most_Annoying_Bugs_10.3 is indexed but en.opensuse.org/Bugs:Most_Annoying_Bugs_
Quite simply, making sure a page resolves with a slash will avoid this problem. I’m of the opinion that this is a pretty silly thing to penalise for without some sort of human review, but it’s important that we pick up on things like this so that we can avoid such “false positive” penalties. Make sure to add “check for URLs ending in .0” to your next checklist for site reviews and please, do share if you’ve found any other filename extensions that exhibit similar behaviour from any of the engines in the comments.
UPDATE:
en.wikipedia.org/wiki/SAML_1.1 also seems to be suffering from a penalty and it will be useful to go through some more URLs that end in .n to gauge whether or not they’re penalised. Most of the examples we saw that didn’t involve a zero had not been hit in any way. I’d love to know how extensive this filter really is.